{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4bf57366-3d41-4b9a-83c6-40dc134b6bc8",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "# Recipe: Multivariate Anomaly Detection with Isolation Forest\n",
    "This recipe shows how you can use SynapseML on Apache Spark for multivariate anomaly detection. Multivariate anomaly detection allows for the detection of anomalies among many variables or timeseries, taking into account all the inter-correlations and dependencies between the different variables. In this scenario, we use SynapseML to train an Isolation Forest model for multivariate anomaly detection, and we then use to the trained model to infer multivariate anomalies within a dataset containing synthetic measurements from three IoT sensors.\n",
    "\n",
    "To learn more about the Isolation Forest model please refer to the original paper by [Liu _et al._](https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf?q=isolation-forest)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "875fd304-7a57-4602-a8d0-12e5bbb7d928",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Library imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "0c160433-2e79-4482-ae3d-b821878f78b5",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "from IPython import get_ipython\n",
    "from IPython.terminal.interactiveshell import TerminalInteractiveShell\n",
    "import uuid\n",
    "import mlflow\n",
    "\n",
    "from pyspark.sql import functions as F\n",
    "from pyspark.ml.feature import VectorAssembler\n",
    "from pyspark.sql.types import *\n",
    "from pyspark.ml import Pipeline\n",
    "\n",
    "from synapse.ml.isolationforest import *\n",
    "\n",
    "from synapse.ml.explainers import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "9257bf1d-b135-41d3-97db-1e96cc629b7d",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from pyspark.sql import SparkSession\n",
    "\n",
    "# Bootstrap Spark Session\n",
    "spark = SparkSession.builder.getOrCreate()\n",
    "\n",
    "from synapse.ml.core.platform import running_on_synapse\n",
    "\n",
    "if running_on_synapse():\n",
    "    shell = TerminalInteractiveShell.instance()\n",
    "    shell.define_macro(\"foo\", \"\"\"a,b=10,20\"\"\")\n",
    "    from notebookutils.visualization import display"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4e372238-4216-4037-a073-f2d2131e11bb",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Input data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "336ece72-013f-4a2b-b8ce-0a8cef2afa4a",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Table inputs\n",
    "timestampColumn = \"timestamp\"  # str: the name of the timestamp column in the table\n",
    "inputCols = [\n",
    "    \"sensor_1\",\n",
    "    \"sensor_2\",\n",
    "    \"sensor_3\",\n",
    "]  # list(str): the names of the input variables\n",
    "\n",
    "# Training Start time, and number of days to use for training:\n",
    "trainingStartTime = (\n",
    "    \"2022-02-24T06:00:00Z\"  # datetime: datetime for when to start the training\n",
    ")\n",
    "trainingEndTime = (\n",
    "    \"2022-03-08T23:55:00Z\"  # datetime: datetime for when to end the training\n",
    ")\n",
    "inferenceStartTime = (\n",
    "    \"2022-03-09T09:30:00Z\"  # datetime: datetime for when to start the training\n",
    ")\n",
    "inferenceEndTime = (\n",
    "    \"2022-03-20T23:55:00Z\"  # datetime: datetime for when to end the training\n",
    ")\n",
    "\n",
    "# Isolation Forest parameters\n",
    "contamination = 0.021\n",
    "num_estimators = 100\n",
    "max_samples = 256\n",
    "max_features = 1.0\n",
    "\n",
    "# MLFlow experiment\n",
    "artifact_path = \"isolationforest\"\n",
    "experiment_name = f\"/Shared/isolation_forest_experiment-{str(uuid.uuid1())}/\"\n",
    "model_name = \"isolation-forest-model\"\n",
    "model_version = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "745da74c-d517-4eaa-ae38-9addf40e89b6",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Read data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "31d806fa-c594-474c-9d80-61ee30d5397c",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "df = (\n",
    "    spark.read.format(\"csv\")\n",
    "    .option(\"header\", \"true\")\n",
    "    .load(\n",
    "        \"wasbs://publicwasb@mmlspark.blob.core.windows.net/generated_sample_mvad_data.csv\"\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "958c2a81-2ca9-46f1-bc9d-792df28a1836",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "cast columns to appropriate data types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "18036afa-d769-4022-ae83-ec7c23db4c69",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "df = (\n",
    "    df.orderBy(timestampColumn)\n",
    "    .withColumn(\"timestamp\", F.date_format(timestampColumn, \"yyyy-MM-dd'T'HH:mm:ss'Z'\"))\n",
    "    .withColumn(\"sensor_1\", F.col(\"sensor_1\").cast(DoubleType()))\n",
    "    .withColumn(\"sensor_2\", F.col(\"sensor_2\").cast(DoubleType()))\n",
    "    .withColumn(\"sensor_3\", F.col(\"sensor_3\").cast(DoubleType()))\n",
    "    .drop(\"_c5\")\n",
    ")\n",
    "\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "604de898-93a6-43cd-a84b-a66fa28b8528",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Training data preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "f20d4eb8-36fe-4376-bba1-7b137aea2eaa",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# filter to data with timestamps within the training window\n",
    "df_train = df.filter(\n",
    "    (F.col(timestampColumn) >= trainingStartTime)\n",
    "    & (F.col(timestampColumn) <= trainingEndTime)\n",
    ")\n",
    "display(df_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "b093b8e1-9a81-4aed-b9a3-295e2012d3ba",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Test data preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "2188efcc-0cd1-4c12-b041-634dc6739ec1",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# filter to data with timestamps within the inference window\n",
    "df_test = df.filter(\n",
    "    (F.col(timestampColumn) >= inferenceStartTime)\n",
    "    & (F.col(timestampColumn) <= inferenceEndTime)\n",
    ")\n",
    "display(df_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "5c3f110b-77c5-4655-b6ea-baea06a5b3bb",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Train Isolation Forest model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "73c6305c-2037-4daf-99ee-f0632d483ca3",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "isolationForest = (\n",
    "    IsolationForest()\n",
    "    .setNumEstimators(num_estimators)\n",
    "    .setBootstrap(False)\n",
    "    .setMaxSamples(max_samples)\n",
    "    .setMaxFeatures(max_features)\n",
    "    .setFeaturesCol(\"features\")\n",
    "    .setPredictionCol(\"predictedLabel\")\n",
    "    .setScoreCol(\"outlierScore\")\n",
    "    .setContamination(contamination)\n",
    "    .setContaminationError(0.01 * contamination)\n",
    "    .setRandomSeed(1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "984d1f15-5d07-4d29-8072-785bb6437df2",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "Next, we create an ML pipeline to train the Isolation Forest model. We also demonstrate how to create an MLFlow experiment and register the trained model.\n",
    "\n",
    "Note that MLFlow model registration is strictly only required if accessing the trained model at a later time. For training the model, and performing inferencing in the same notebook, the model object model is sufficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4dd284f3-9259-49a8-829c-70cb85159927",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "mlflow.set_experiment(experiment_name)\n",
    "with mlflow.start_run():\n",
    "    va = VectorAssembler(inputCols=inputCols, outputCol=\"features\")\n",
    "    pipeline = Pipeline(stages=[va, isolationForest])\n",
    "    model = pipeline.fit(df_train)\n",
    "    mlflow.spark.log_model(\n",
    "        model, artifact_path=artifact_path, registered_model_name=model_name\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "afb24955-8709-451c-91f3-37c0daa70bc0",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Perform inferencing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "57cda5af-b090-4b6d-ad07-530519e0300e",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "Load the trained Isolation Forest Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "f44b9a1f-c2fe-4b5b-a318-4d6d73370978",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "model_uri = f\"models:/{model_name}/{model_version}\"\n",
    "model = mlflow.spark.load_model(model_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "434dbf90-8013-4aef-aabf-51a293270082",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "Perform inferencing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "1e3f0c2f-7398-4807-9d94-d25d508877d7",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "df_test_pred = model.transform(df_test)\n",
    "display(df_test_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "9d42c4b4-6a2c-4e1c-a67f-53842970a58d",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## ML interpretability\n",
    "In this section, we use ML interpretability tools to help unpack the contribution of each sensor to the detected anomalies at any point in time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "513e7d85-9fc1-4804-a20a-3621fb56ebc4",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Here, we create a TabularSHAP explainer, set the input columns to all the features the model takes, specify the model and the target output column\n",
    "# we are trying to explain. In this case, we are trying to explain the \"outlierScore\" output.\n",
    "shap = TabularSHAP(\n",
    "    inputCols=inputCols,\n",
    "    outputCol=\"shapValues\",\n",
    "    model=model,\n",
    "    targetCol=\"outlierScore\",\n",
    "    backgroundData=F.broadcast(df_test),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Display the dataframe with `shapValues` column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "2f357274-a06b-4c4e-a4ae-3a65593a584a",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "shap_df = shap.transform(df_test_pred)\n",
    "display(shap_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "d9ab077a-cccb-4ffb-9d8f-8ed5d19ae6fb",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Define UDF\n",
    "vec2array = udf(lambda vec: vec.toArray().tolist(), ArrayType(FloatType()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "ff8345cf-fdbc-482d-bf92-5d560724d245",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Here, we extract the SHAP values, the original features and the outlier score column. Then we convert it to a Pandas DataFrame for visualization.\n",
    "# For each observation, the first element in the SHAP values vector is the base value (the mean output of the background dataset),\n",
    "# and each of the following elements represents the SHAP values for each feature\n",
    "shaps = (\n",
    "    shap_df.withColumn(\"shapValues\", vec2array(F.col(\"shapValues\").getItem(0)))\n",
    "    .select(\n",
    "        [\"shapValues\", \"outlierScore\"] + inputCols + [timestampColumn, \"prediction\"]\n",
    "    )\n",
    "    .withColumn(\"sensor_1_localimp\", F.col(\"shapValues\")[1])\n",
    "    .withColumn(\"sensor_2_localimp\", F.col(\"shapValues\")[2])\n",
    "    .withColumn(\"sensor_3_localimp\", F.col(\"shapValues\")[3])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shaps_local = shaps.toPandas()\n",
    "shaps_local"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Retrieve local feature importances"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "6c213285-0c67-4712-8553-a19ba9c2b399",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "local_importance_values = shaps_local[[\"shapValues\"]]\n",
    "eval_data = shaps_local[inputCols]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "df4d6330-eb33-47bc-bdd9-9576ba947524",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Removing the first element in the list of local importance values (this is the base value or mean output of the background dataset)\n",
    "list_local_importance_values = local_importance_values.values.tolist()\n",
    "converted_importance_values = []\n",
    "bias = []\n",
    "for classarray in list_local_importance_values:\n",
    "    for rowarray in classarray:\n",
    "        converted_list = rowarray.tolist()\n",
    "        bias.append(converted_list[0])\n",
    "        # remove the bias from local importance values\n",
    "        del converted_list[0]\n",
    "        converted_importance_values.append(converted_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, install libraries required for ML Interpretability analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade raiwidgets interpret-community"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "3fd82f1d-d6ad-42a7-a3c2-410e54384123",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "from interpret_community.adapter import ExplanationAdapter\n",
    "\n",
    "adapter = ExplanationAdapter(inputCols, classification=False)\n",
    "global_explanation = adapter.create_global(\n",
    "    converted_importance_values, eval_data, expected_values=bias\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "2dabe48c-3c42-428f-8794-95716a551f7b",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# view the global importance values\n",
    "global_explanation.global_importance_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "614e5806-a30e-46b1-ae6e-a69cc2922a4c",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# view the local importance values\n",
    "global_explanation.local_importance_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "ba8a85b6-5874-45b8-93a3-242badf490e3",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Defining a wrapper class with predict method for creating the Explanation Dashboard\n",
    "\n",
    "\n",
    "class wrapper(object):\n",
    "    def __init__(self, model):\n",
    "        self.model = model\n",
    "\n",
    "    def predict(self, data):\n",
    "        sparkdata = spark.createDataFrame(data)\n",
    "        return (\n",
    "            model.transform(sparkdata)\n",
    "            .select(\"outlierScore\")\n",
    "            .toPandas()\n",
    "            .values.flatten()\n",
    "            .tolist()\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "8932f907-02f3-40c3-9de7-d746d147c252",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## Visualize results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visualize anomaly results and feature contribution scores (derived from local feature importance)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "222bfd1b-2e5f-44d6-a3c5-abd52ac060e1",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "!pip install matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "def visualize(rdf):\n",
    "    anoms = list(rdf[\"prediction\"] == 1)\n",
    "\n",
    "    fig = plt.figure(figsize=(26, 12))\n",
    "\n",
    "    ax = fig.add_subplot(611)\n",
    "    ax.title.set_text(f\"Multivariate Anomaly Detection Results\")\n",
    "    ax.plot(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_1\"],\n",
    "        color=\"tab:orange\",\n",
    "        linestyle=\"solid\",\n",
    "        linewidth=2,\n",
    "        label=\"sensor_1\",\n",
    "    )\n",
    "    ax.grid(axis=\"y\")\n",
    "    _, _, ymin, ymax = plt.axis()\n",
    "    ax.vlines(\n",
    "        rdf[timestampColumn][anoms],\n",
    "        ymin=ymin,\n",
    "        ymax=ymax,\n",
    "        color=\"tab:red\",\n",
    "        alpha=0.2,\n",
    "        linewidth=6,\n",
    "    )\n",
    "    ax.tick_params(axis=\"x\", which=\"both\", bottom=False, labelbottom=False)\n",
    "    ax.set_ylabel(\"sensor1_value\")\n",
    "    ax.legend()\n",
    "\n",
    "    ax = fig.add_subplot(612, sharex=ax)\n",
    "    ax.plot(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_2\"],\n",
    "        color=\"tab:green\",\n",
    "        linestyle=\"solid\",\n",
    "        linewidth=2,\n",
    "        label=\"sensor_2\",\n",
    "    )\n",
    "    ax.grid(axis=\"y\")\n",
    "    _, _, ymin, ymax = plt.axis()\n",
    "    ax.vlines(\n",
    "        rdf[timestampColumn][anoms],\n",
    "        ymin=ymin,\n",
    "        ymax=ymax,\n",
    "        color=\"tab:red\",\n",
    "        alpha=0.2,\n",
    "        linewidth=6,\n",
    "    )\n",
    "    ax.tick_params(axis=\"x\", which=\"both\", bottom=False, labelbottom=False)\n",
    "    ax.set_ylabel(\"sensor2_value\")\n",
    "    ax.legend()\n",
    "\n",
    "    ax = fig.add_subplot(613, sharex=ax)\n",
    "    ax.plot(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_3\"],\n",
    "        color=\"tab:purple\",\n",
    "        linestyle=\"solid\",\n",
    "        linewidth=2,\n",
    "        label=\"sensor_3\",\n",
    "    )\n",
    "    ax.grid(axis=\"y\")\n",
    "    _, _, ymin, ymax = plt.axis()\n",
    "    ax.vlines(\n",
    "        rdf[timestampColumn][anoms],\n",
    "        ymin=ymin,\n",
    "        ymax=ymax,\n",
    "        color=\"tab:red\",\n",
    "        alpha=0.2,\n",
    "        linewidth=6,\n",
    "    )\n",
    "    ax.tick_params(axis=\"x\", which=\"both\", bottom=False, labelbottom=False)\n",
    "    ax.set_ylabel(\"sensor3_value\")\n",
    "    ax.legend()\n",
    "\n",
    "    ax = fig.add_subplot(614, sharex=ax)\n",
    "    ax.tick_params(axis=\"x\", which=\"both\", bottom=False, labelbottom=False)\n",
    "    ax.plot(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"outlierScore\"],\n",
    "        color=\"black\",\n",
    "        linestyle=\"solid\",\n",
    "        linewidth=2,\n",
    "        label=\"Outlier score\",\n",
    "    )\n",
    "    ax.set_ylabel(\"outlier score\")\n",
    "    ax.grid(axis=\"y\")\n",
    "    ax.legend()\n",
    "\n",
    "    ax = fig.add_subplot(615, sharex=ax)\n",
    "    ax.tick_params(axis=\"x\", which=\"both\", bottom=False, labelbottom=False)\n",
    "    ax.bar(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_1_localimp\"].abs(),\n",
    "        width=2,\n",
    "        color=\"tab:orange\",\n",
    "        label=\"sensor_1\",\n",
    "    )\n",
    "    ax.bar(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_2_localimp\"].abs(),\n",
    "        width=2,\n",
    "        color=\"tab:green\",\n",
    "        label=\"sensor_2\",\n",
    "        bottom=rdf[\"sensor_1_localimp\"].abs(),\n",
    "    )\n",
    "    ax.bar(\n",
    "        rdf[timestampColumn],\n",
    "        rdf[\"sensor_3_localimp\"].abs(),\n",
    "        width=2,\n",
    "        color=\"tab:purple\",\n",
    "        label=\"sensor_3\",\n",
    "        bottom=rdf[\"sensor_1_localimp\"].abs() + rdf[\"sensor_2_localimp\"].abs(),\n",
    "    )\n",
    "    ax.set_ylabel(\"Contribution scores\")\n",
    "    ax.grid(axis=\"y\")\n",
    "    ax.legend()\n",
    "\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "9bacbb4a-6e3b-412a-8a8c-588ee2ee4e76",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "visualize(shaps_local)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you run the cell above, you will see the following plots:\n",
    "\n",
    "![](https://mmlspark.blob.core.windows.net/graphics/notebooks/mvad_results_local_importances.jpg)\n",
    "\n",
    "- The first 3 plots above show the sensor time series data in the inference window, in orange, green, purple and blue. The red vertical lines show the detected anomalies (`prediction` = 1).. \n",
    "- The fourth plot shows the outlierScore of all the points, with the `minOutlierScore` threshold shown by the dotted red horizontal line\n",
    "- The last plot shows the contribution scores of each sensor to the `outlierScore` for that point."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot aggregate feature importance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 7))\n",
    "plt.bar(inputCols, global_explanation.global_importance_values)\n",
    "plt.ylabel(\"global importance values\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you run the cell above, you will see the following global feature importance plot:\n",
    "\n",
    "![](https://mmlspark.blob.core.windows.net/graphics/notebooks/global_feature_importance.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visualize the explanation in the ExplanationDashboard from https://github.com/microsoft/responsible-ai-widgets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "140602e6-908e-4b32-ab9c-49dd79705171",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# View the model explanation in the ExplanationDashboard\n",
    "from raiwidgets import ExplanationDashboard\n",
    "\n",
    "ExplanationDashboard(global_explanation, wrapper(model), dataset=eval_data)"
   ]
  }
 ],
 "metadata": {
  "application/vnd.databricks.v1+notebook": {
   "dashboards": [],
   "language": "python",
   "notebookMetadata": {
    "pythonIndentUnit": 4
   },
   "notebookName": "isolation-forest-training-inference-smlpr_final",
   "notebookOrigID": 595761003227917,
   "widgets": {}
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
